Tag

#language modeling

1 article

NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

NVIDIA's Gated DeltaNet-2 decouples erase and write operations in linear attention, outperforming models like Mamba-2 and KDA in long-context tasks.